Monte Carlo Analysis (Part 1)

Edward Vytlacil

Monte Carlo Analysis

  • Use computer simulation to stochastically approximate some quantity that is difficult or infeasible to calculate analytically.

MC Methods in Economics

MC simulation is used broadly across economics:

  • Simulation of economic models — DSGE models, dynamic games in IO, trade models with many countries/sectors. Simulate agent behavior and compute equilibrium outcomes when closed-form solutions are unavailable.
  • Structural estimation — simulated method of moments, indirect inference, simulated maximum likelihood. Approximate intractable likelihood functions or moment conditions by simulation over unobservables.

MC Methods in Economics (cont’d)

  • Bayesian econometrics — Markov Chain Monte Carlo (MCMC) methods to approximate posterior distributions. Widely used in Bayesian macro and microeconometrics.
  • Computationally intensive methods for inference - such as bootstrap and randomization inference.
  • Evaluating econometric methods — assess finite-sample properties of estimators, confidence intervals, and tests when analytical results are unavailable or when asymptotic approximations may be unreliable.

Today:

  • Generate random draws in R;

  • Approximate the probability of some event by simulation;

  • Evaluate expected value of some function by simulation;

Next Lecture:

  • Evaluate econometric methods by simulation.

We will return to computationally intensive methods for inference later in the course.

Simulate Rolling a Die

  • Simulating one roll of a die:

    • R code: sample(1:6,1) which randomly drawing an element from {1,2,3,4,5,6}, each element equally likely, one time.
sample(1:6,1)
[1] 1

Simulate Rolling a Die

  • Rolls are random, each time can get different results:
sample(1:6,1)
[1] 5
sample(1:6,1)
[1] 1
sample(1:6,1)
[1] 5
sample(1:6,1)
[1] 2

Are results really random?

  • Computers are deterministic.

    • To create truly random numbers, can use a physical process, e.g., from nuclear decay of radioisotopes.

    • Not necessary for our purposes, and not what R is doing…

    • R is generating pseudo-random numbers

R generating pseudo-random numbers

  • R is generating pseudo-random numbers

    • starting point is a seed, in R determined by default by computer’s real time clock, along with process ID number;
    • algorithm applied to seed to obtain pseudo-random numbers.
    • not actually random, as deterministic given seed, but close enough to random for our purposes.

Replication?

  • You may obtain different pseudo-random numbers each time you run your code, neither you nor others can replicate your results.

  • For replication purposes, can set.seed

set.seed(5551)
sample(1:6,1)
sample(1:6,1)
sample(1:6,1)
[1] 5
[1] 3
[1] 1
set.seed(5551)
sample(1:6,1)
sample(1:6,1)
sample(1:6,1)
[1] 5
[1] 3
[1] 1

Can draw from . . .

  • Common distributions with built-in generators in R:

    Distribution Function Returns
    Binomial rbinom(n,size,prob) draw \(n\) times from \(\mbox{Binom}(\mbox{size},p).\)
    Normal rnorm(n,mean=0,sd=1) draw \(n\) times from \(N(\mbox{mean},\mbox{sd})\)
    Student-t rt(n,df) draw \(n\) times from \(t_{df}\) distribution.
    Uniform runif(n,min=0,max=1) draw \(n\) times from \(\mbox{Unif}[\mbox{min},\mbox{max}]\).
    Poisson rpois(n,lambda) draw \(n\) times from \(\mbox{Pois}(\lambda)\).

What if no built-in generator exists?

Theorem. (Inverse Probability Transform) Let \(U \sim \text{Unif}(0,1)\) and let \(F\) be a CDF with generalized inverse \[F^{-1}(u) = \inf\{x : F(x) \ge u\}.\] Then \(X = F^{-1}(U)\) has distribution \(F\).

What if … (cont’d)

Theorem. (Inverse Probability Transform) Let \(U \sim \text{Unif}(0,1)\) and let \(F\) be a CDF with generalized inverse \[F^{-1}(u) = \inf\{x : F(x) \ge u\}.\] Then \(X = F^{-1}(U)\) has distribution \(F\).

  • Implication: To draw from any distribution with known \(F^{-1}\), draw \(U \sim \text{Uniform}(0,1)\) and compute \(F^{-1}(U)\).

  • Use when \(F^{-1}\) is available in closed form and no built-in generator exists.

    • (custom distributions, truncated distributions, mixture components).

Ex: Drawing from Pareto Distribution

  • Pareto distribution: \(F(x) = 1 - \left(\frac{x_m}{x}\right)^\alpha\) for \(x \ge x_m > 0\).

  • Inverting: \(F^{-1}(u) = \frac{x_m}{(1-u)^{1/\alpha}}\).

  • For \(B=100,000\) draws from \(\text{Pareto}(\alpha=2.5, x_m=1)\) by inverse CDF method:

alpha <- 2.5; x_m <- 1;  
B <- 100000

# Draw via inverse CDF
U <- runif(B)
X_pareto <- x_m / (1 - U)^(1/alpha)

More general principle:

  • Constructive (algorithmic) simulation: if you can describe the process that generates \(X\) as a sequence of steps involving simple random primitives (coin flips, uniform draws, normal draws, …), simulate \(X\) by executing that algorithm directly.

  • Does not require knowing the CDF of \(X\) — only requires knowing the mechanism that produces \(X\).

  • This is often the natural approach to simulate economic models: we specify primitives of a model (preferences, technology, shocks) and simulate outcomes.

Constructive Simulation:

Suppose a r.v. \(X\) is defined as a function of a (possibly random-length) sequence of simpler r.v.s: \[X = g(Z_1, Z_2, \ldots)\] where each \(Z_i\) is easy to simulate. Then to draw \(X\):

  1. Draw \(Z_1, Z_2, \ldots\) according to their specified distribution.
  2. Compute \(X = g(Z_1, Z_2, \ldots)\).

Example: Stopping Time (Coin Flips)

Let \(X\) = number of fair coin flips until 4 heads in a row.

  • \(X\) is a well-defined random variable, but its PMF has no simple closed form.

  • But we know exactly how to generate \(X\): flip coins and count.

Example: Stopping Time (Coin Flips)

run_until_4H <- function() {
  consec <- 0; flips <- 0
  while (consec < 4) {
    flips <- flips + 1
    if (sample(0:1, 1) == 1) {
      consec <- consec + 1
    } else {
      consec <- 0
    }
  }
  flips
}

run_until_4H()
[1] 9

How to draw \(B\) independent copies of \(X\)?

Example: replicate

  • replicate(B, expr) evaluates expr \(B\) times and returns a vector of the results.

  • Using replicate with our previously defined function run_until_4H() to draw \(B\) independent copies of \(X\):

B <- 100000
X_stop <- replicate(B, run_until_4H())

A Note on Loops vs. Vectorization in R

  • R is an interpreted language with high per-iteration overhead — scalar for/while loops can be very slow.

  • Vectorized operations (operating on entire vectors at once) call optimized C/Fortran code under the hood and are typically orders of magnitude faster.

A Note … (cont’d)

# Slow: scalar loop
system.time({
  x <- numeric(1e6)
  for (i in 1:1e6) x[i] <- rnorm(1)
})
   user  system elapsed 
  0.317   0.007   0.325 
# Fast: vectorized
system.time({
  x <- rnorm(1e6)
})
   user  system elapsed 
  0.014   0.001   0.014 
  • Rule of thumb for MC simulation in R: vectorize the inner computations where possible; use replicate (or sapply/vapply) for the outer loop over MC replications.

A Note … (cont’d)

  • Our Pareto example was fully vectorized — x_m / (1 - runif(B))^(1/alpha) applies the inverse CDF element-wise to an entire vector of uniform draws. No loop or replicate needed.

  • Our run_until_4H() requires a loop (random stopping time — we don’t know how many draws are needed in advance).

  • Not everything can be vectorized. But when the number of draws is fixed, vectorize aggressively.

Further Examples

The same principle applies to more complex settings:

  • Order statistics: draw \(Y_1,\ldots,Y_N \stackrel{i.i.d.}{\sim} F\), return \(X = Y_{(k)}\).

  • First passage times: simulate a random walk \(S_n = \sum_{i=1}^n Z_i\), return \(X = \inf\{n : S_n > c\}\).

  • Estimators and test statistics: draw a sample from a DGP, compute \(\hat{\beta}\) or a \(t\)-statistic.

Further Examples (cont’d)

  • Auction outcomes: draw bidder valuations from \(F\), compute equilibrium bids, return winning bid or seller revenue.

  • Search models: worker draws wage offers each period, accepts or continues searching according to a reservation wage rule. Simulate time to employment, accepted wage, etc.

  • Firm Exit: simulate profit shocks and exit decision, return firm lifespan.

We next turn to using MC to approximate probabilities and expectations.

Approximate Probability of an Event

  • For given distribution \(F\) and set \(A\), approximate \(\mathbb{P}_F[X \in A]\) by \[\frac{1}{B}\sum_{b=1}^B \mathbf{1}\{X_b \in A\}, \qquad X_1,\ldots,X_B \stackrel{i.i.d.}{\sim} F.\]

  • By WLLN, \[\frac{1}{B}\sum_{b=1}^B \mathbf{1}\{X_b \in A\} \xrightarrow{P} \mathbb{P}_F[X \in A]\qquad \text{as} \qquad B \to \infty.\]

Approximate Probability of an Event

For example, what is the probability that it takes fewer than 20 flips to get 4 heads in a row?

  • \(X\) = number of flips until 4 consecutive heads, \(A = \{x : x < 20\}\).
B <- 100000
X_stop <- replicate(B, run_until_4H())
mean(X_stop < 20)
[1] 0.45621

MC Accuracy: How Large Should \(B\) Be?

  • Let \(p = \mathbb{P}_F[X \in A]\) and \(\hat{p}_B = \frac{1}{B}\sum_{b=1}^B \mathbf{1}\{X_b \in A\}\).

  • Each \(\mathbf{1}\{X_b \in A\} \stackrel{i.i.d.}{\sim} \text{Bernoulli}(p)\), so \(B\,\hat{p}_B \sim \text{Binomial}(B, p)\).

  • Therefore:

    • \(\mathbb{E}[\hat{p}_B] = p\) (unbiased).
    • \(\text{Var}(\hat{p}_B) = \frac{p(1-p)}{B}\).
    • \(\text{SD}(\hat{p}_B) = \sqrt{\frac{p(1-p)}{B}}\), the MC standard error.

MC Accuracy . . . (cont’d)

  • \(\text{SD}(\hat{p}_B) = \sqrt{\frac{p(1-p)}{B}}\), the MC standard error.

  • Since \(p(1-p) \le 1/4\), the MC standard error is at most \(\frac{1}{2\sqrt{B}}\), regardless of \(p\).

    \(B\) Max MC Std. Error
    \(1{,}000\) \(0.0158\)
    \(10{,}000\) \(0.0050\)
    \(100{,}000\) \(0.0016\)
    \(1{,}000{,}000\) \(0.0005\)

MC Accuracy: Distribution of \(\hat{p}_B\)

  • As \(B\) grows, \(\hat{p}_B\) concentrates around \(p\) — the LLN in action.

MC Accuracy: Repeated Simulations

What if we repeat the entire MC exercise \(R = 1{,}000\) times, each with \(B\) replications?

  • Increasing \(B\) by a factor of 100 reduces the MC standard error by a factor of 10.

MC Accuracy: Choosing \(B\)

  • By CLT, an approximate \(95\%\) confidence interval for \(p\) based on MC simulation: \[\hat{p}_B \;\pm\; 1.96\,\sqrt{\frac{\hat{p}_B(1-\hat{p}_B)}{B}}.\]

  • To achieve MC accuracy of \(\pm \epsilon\) with \(95\%\) confidence, need \[B \ge \frac{(1.96)^2 \, p(1-p)}{\epsilon^2} \;\le\; \frac{(1.96)^2}{4\,\epsilon^2} \approx \frac{1}{\epsilon^2}.\]

MC Accuracy: Choosing \(B\) (cont’d)

  • To achieve MC accuracy of \(\pm \epsilon\) with \(95\%\) confidence requires approximately \(B \approx 1/\epsilon^2\) replications:

    Desired accuracy (\(\epsilon\)) Required \(B\) (worst case)
    \(\pm 0.01\) \(\approx 10{,}000\)
    \(\pm 0.001\) \(\approx 1{,}000{,}000\)
    \(\pm 0.0001\) \(\approx 100{,}000{,}000\)
  • Diminishing returns: each additional decimal of accuracy costs \(100\times\) more replications.

MC Accuracy vs. Statistical Estimation

  • The math is identical: for i.i.d. draws, \(\text{Var}(\bar{X}_B) = \sigma^2/B\) whether the draws come from a MC simulation or from sampling a population.

  • The key practical difference: in statistical estimation, the sample size \(n\) is typically fixed by the data you have (or can afford to collect). In MC simulation, \(B\) is a choice variable — you can always reduce MC error by running more replications.

MC Accuracy vs. … (cont’d)

  • Implication: MC approximation error is not a fundamental limitation. If your MC standard error is too large, increase \(B\) and wait longer. The only cost is computation time.

  • This is why we distinguish between:

    • Statistical uncertainty (from having a finite sample from the population), and
    • MC uncertainty (from using a finite number of simulation draws), which can be made arbitrarily small.

Approximating \(\mathbb{E}_F[g(X)]\)

  • More generally, approximate \(\mathbb{E}_F[g(X)]\) by \[\frac{1}{B}\sum_{b=1}^B g(X_b), \qquad X_1,\ldots,X_B \stackrel{i.i.d.}{\sim} F.\]

  • Approximating \(\mathbb{P}_F[X \in A]\) is a special case with \(g(x) = \mathbf{1}\{x \in A\}\).

  • By WLLN, \(\frac{1}{B}\sum_{b=1}^B g(X_b) \xrightarrow{P} \mathbb{E}_F[g(X)]\) as \(B \to \infty\),

    provided \(\mathbb{E}_F[|g(X)|] < \infty\).

Approximating \(\mathbb{E}_F[g(X)]\) (cont’d)

  • The MC standard error is now \(\text{SD}(g(X))/\sqrt{B},\) estimated by \(\hat{\sigma}_g / \sqrt{B}\) where \[\hat{\sigma}_g^2 = \frac{1}{B-1}\sum_{b=1}^B \left(g(X_b) - \bar{g}\right)^2, \qquad \bar{g}= \frac{1}{B} \sum_{b=1}^B g(X_b) .\]

  • The earlier formula is the special case for \(g = \mathbf{1}_A\).

  • As before, accuracy improves at rate \(1/\sqrt{B}\), but now the constant depends on \(\text{Var}(g(X))\).

Example: \(\mathbb{E}[X]\) for Stopping Time

Using our earlier \(100{,}000\) draws of \(X\) = flips until 4 consecutive heads:

c(mean    = mean(X_stop),
  E_X_sq  = mean(X_stop^2),
  var     = var(X_stop),
  P_lt_20 = mean(X_stop < 20))
   mean  E_X_sq     var P_lt_20 
  30.00 1600.00  740.00    0.46 
  • Three of these are MC approximations of \(\mathbb{E}_F[g(X)]\) for different choices of \(g\):
    • \(g(x) = x\), \(\;g(x) = x^2\), \(\;g(x) = \mathbf{1}\{x < 20\}\).
    • All moments of \(X\) are finite \(\Rightarrow\) WLLN applies in each case.
  • The variance is a continuous function of \((\mathbb{E}[X], \mathbb{E}[X^2])\). The MC approximation of each expectation is consistent by the WLLN, so the MC approximation of the variance is consistent by the CMT.

Example: Convergence of \(\bar{X}_B\)

Plotting the cumulative average \(\bar{X}_B = \frac{1}{B}\sum_{b=1}^B X_b\) as \(B\) grows:

  • \(\bar{X}_B\) converges to \(\mathbb{E}[X] = 30\) — the LLN at work.

What Can Go Wrong: The Cauchy Distribution

The Cauchy distribution has density \[f(x) = \frac{1}{\pi(1 + x^2)}, \qquad x \in \mathbb{R}.\]

  • Symmetric around zero, looks similar to a standard normal but with much heavier tails.

  • Key property: \(\mathbb{E}[|X|] = \infty\), so \(\mathbb{E}[X]\) does not exist.

    • \(\Rightarrow\) WLLN does not apply to \(\bar{X}_B\).

Cauchy: Cumulative Average

  • \(\bar{X}_B\) does not settle down — it keeps jumping erratically, even at \(B = 100{,}000\).

Side by Side: LLN Holds vs. Fails

  • Left: 50 independent MC runs — all converge to \(\mathbb{E}[X] = 30\).
  • Right: 50 independent MC runs — no convergence. The paths continue to wander.

Cauchy: Why the Sample Mean Fails

Across \(1{,}000\) draws of \(\bar{X}_{10{,}000}\), each based on \(10{,}000\) i.i.d. Cauchy r.v.s:

cauchy_means <- replicate(1000,
                  mean(rcauchy(10000)))
var(cauchy_means)
min(cauchy_means)
max(cauchy_means)
  var   min   max 
65000  -210  8000 
  • The variance of \(\bar{X}_B\) does not shrink with \(B\).

  • Can show that sample average of i.i.d. Cauchy r.v.s is itself Cauchy — averaging provides no improvement whatsoever.

Takeaway

  • MC approximation of \(\mathbb{E}_F[g(X)]\) via \(\frac{1}{B}\sum_{b=1}^B g(X_b)\) is:

    • Justified by the WLLN when \(\mathbb{E}[|g(X)|] < \infty\).
    • Not valid when \(\mathbb{E}[|g(X)|] = \infty\).
  • The Cauchy example is a cautionary tale, but also illustrates a practical point: always check whether the moments you are trying to approximate actually exist.

Diagnostics for Moment Problems

In practice, you may not know whether \(\mathbb{E}[|g(X)|] < \infty\). Some warning signs:

  • Instability across runs: re-run the MC simulation with different seeds. If \(\bar{g}_B\) varies wildly across runs even for large \(B\), suspect a moment problem.

  • Running mean doesn’t settle: plot \(\bar{X}_B\) as a function of \(B\). Convergence should be visually apparent; persistent drift or jumps suggest trouble.

  • Sensitivity to extreme draws: if removing the largest \(1\%\) of \(|g(X_b)|\) values substantially changes \(\bar{g}_B\), the tails are driving the estimate.

Rerunning with Different Seeds

Code
seeds <- 1:25
B_diag <- 100000

# Stopping time: stable across seeds
means_stop <- sapply(seeds, function(s) {
  set.seed(s); mean(replicate(B_diag, run_until_4H()))
})

# Cauchy: unstable across seeds
means_cauchy <- sapply(seeds, function(s) {
  set.seed(s); mean(rcauchy(B_diag))
})

df_diag <- data.frame(
  seed = rep(seeds, 2),
  mean = c(means_stop, means_cauchy),
  dist = rep(c("Stopping Time — E[X] = 30",
               "Cauchy — E[X] does not exist"), each = length(seeds))
)

ggplot(df_diag, aes(x = factor(seed), y = mean)) +
  geom_point(size = 2, color = "steelblue") +
  geom_hline(data = data.frame(
    dist = c("Stopping Time — E[X] = 30",
             "Cauchy — E[X] does not exist"),
    yint = c(30, 0)),
  aes(yintercept = yint), linetype = "dashed") +
  facet_wrap(~ dist, scales = "free_y") +
  labs(x = "Seed", y = expression(bar(X)[B]),
  title = paste0("MC estimate across 25 independent runs (B = ",
               formatC(B_diag, format = "d", big.mark = ","), ")")) +
  theme_minimal() +
  theme(axis.text.x = element_text(size = 7))

Diagnostics: Running Mean

Code
set.seed(51)

B_diag <- 100000
X_stop_diag <- replicate(B_diag, run_until_4H())
X_cauchy_diag <- rcauchy(B_diag)

df_running <- data.frame(
  B = rep(1:B_diag, 2),
  cum_mean = c(cumsum(X_stop_diag) / seq_along(X_stop_diag),
               cumsum(X_cauchy_diag) / seq_along(X_cauchy_diag)),
  dist = rep(c("Stopping Time — E[X] = 30",
               "Cauchy — E[X] does not exist"), each = B_diag)
)

print(ggplot(df_running, aes(x = B, y = cum_mean)) +
  geom_line(color = "steelblue", linewidth = 0.4) +
  geom_hline(data = data.frame(
    dist = c("Stopping Time — E[X] = 30",
             "Cauchy — E[X] does not exist"),
    yint = c(30, 0)),
    aes(yintercept = yint), linetype = "dashed", color = "red") +
  facet_wrap(~ dist, scales = "free_y") +
  scale_x_continuous(labels = scales::comma, expand = c(0, 0)) +
  labs(x = "B (number of replications)",
       y = expression(bar(X)[B]),
       title = "Cumulative Average as Diagnostic") +
  theme_minimal())
  • Left: \(\bar{X}_B\) visibly settles, reassuring.
  • Right: \(\bar{X}_B\) still drifting at \(B = 100{,}000\) — a red flag.

Sensitivity to Trimming

Code
X_c <- X_cauchy_diag  

# Full sample vs. trimmed
full_mean  <- mean(X_c)
trim_99    <- mean(X_c[abs(X_c) <= quantile(abs(X_c), 0.99)])
trim_95    <- mean(X_c[abs(X_c) <= quantile(abs(X_c), 0.95)])

# Same for stopping time
full_mean_s  <- mean(X_stop)
trim_99_s    <- mean(X_stop[X_stop <= quantile(X_stop, 0.99)])
trim_95_s    <- mean(X_stop[X_stop <= quantile(X_stop, 0.95)])

df_trim <- data.frame(
  Trim = rep(c("None", "Top 1%", "Top 5%"), 2),
  Mean = c(full_mean, trim_99, trim_95,
           full_mean_s, trim_99_s, trim_95_s),
  Dist = rep(c("Cauchy", "Stopping Time"), each = 3)
)
df_trim$Trim <- factor(df_trim$Trim, levels = c("None", "Top 1%", "Top 5%"))

print(ggplot(df_trim, aes(x = Trim, y = Mean)) +
  geom_col(fill = "steelblue", alpha = 0.7, width = 0.5) +
  geom_hline(data = data.frame(Dist = c("Cauchy", "Stopping Time"),
                                yint = c(0, 30)),
             aes(yintercept = yint), linetype = "dashed") +
  facet_wrap(~ Dist, scales = "free_y") +
  labs(x = "Observations Trimmed",
       y = expression(bar(X)),
       title = "Sensitivity of MC Estimate to Trimming Extreme Values") +
  theme_minimal())
  • For the stopping time, trimming barely matters. For the Cauchy, the estimate shifts substantially — a red flag that the mean may not exist.

Future Lectures

  • Next Lecture: Using MC simulations to evaluate finite-sample performance of estimators and inference procedures.

  • Future Lectures:

    • Use of MC simulation in implementing randomization inference.

    • Use of MC simulation in implementing bootstrap inference.